Skip to content

[core] Introduce BucketSelector based on partition values to achieve bucket level predicate push down#7486

Merged
JingsongLi merged 2 commits intoapache:masterfrom
JingsongLi:BucketSelector_by_partition
Mar 20, 2026
Merged

[core] Introduce BucketSelector based on partition values to achieve bucket level predicate push down#7486
JingsongLi merged 2 commits intoapache:masterfrom
JingsongLi:BucketSelector_by_partition

Conversation

@JingsongLi
Copy link
Contributor

@JingsongLi JingsongLi commented Mar 19, 2026

Purpose

Introducing BucketSelector based on partition values to achieve bucket level predicate push down optimization.

Case 1: bucket filtering with compound predicates on a single-field bucket key.

Table schema:

  • Partition key: column 'a' (INT)
  • Bucket key: column 'b' (INT)
  • Bucket count: 10

Data distribution: 5 partitions (a=1 to 5) × 20 b-values (b=1 to 20) = 100 rows.

Scenarios:

  • Predicate: (a < 3 AND b = 5) OR (a = 3 AND b = 7) - Tests partition range filter with bucket equality, combined with OR. Expected: buckets for partition 1,2 with b=5 and partition 3 with b=7.
  • Predicate: (a < 3 AND b = 5) OR (a = 3 AND b < 100) - Tests partition range with bucket equality, OR partition equality with bucket range. Expected: mixed buckets from partition 3 and specific buckets from partitions 1,2.
  • Predicate: (a = 2 AND b = 5) OR (a = 3 AND b = 7) - Tests partition equality with bucket equality in both OR branches. Expected: exact bucket matching for each partition-b combination.

Case2: bucket filtering with compound predicates on a composite (multi-field) bucket key.

Table schema:

  • Partition key: column 'a' (INT)
  • Bucket key: columns 'b' and 'c' (composite, INT)
  • Bucket count: 10

Data distribution: 5 partitions (a=1 to 5) × 20 b-values (b=1 to 20) × 10 c-values (c=0 to 9) = 1000 rows.

Test scenarios:

  • Predicate: ((a < 3 AND b = 5) OR (a = 3 AND b = 7)) AND c = 5 - Tests nested OR within AND, with partition range, bucket field equality, and additional bucket field filter. The 'c = 5' condition is part of the composite bucket key, affecting bucket selection.
  • Predicate: ((a < 3 AND b = 5) OR (a = 3 AND b < 100)) AND c = 5 - Tests range predicate on one bucket field (b) combined with equality on another (c). Validates handling of multiple bucket key fields with different predicate types.
  • Predicate: ((a = 2 AND b = 5) OR (a = 3 AND b = 7)) AND c = 5 - Tests exact matching on both partition and bucket fields. The composite bucket key (b,c) ensures precise bucket targeting.

Tests

API and Format

Documentation

Generative AI tooling

@JingsongLi JingsongLi changed the title [core] Introduce BucketSelector by partition value [core] Introduce BucketSelector based on partition values to achieve bucket level predicate push down Mar 19, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a partition-aware BucketSelector to enable bucket-level predicate pushdown for compound predicates by evaluating partition predicates against concrete partition values during scan planning.

Changes:

  • Add BucketSelector + PartitionValuePredicateVisitor and wire full-predicate propagation via withCompleteFilter to enable partition-aware bucket pruning.
  • Update bucket filtering plumbing to use TriFilter<BinaryRow, Integer, Integer> (partition, bucket, totalBucket) end-to-end.
  • Add new unit/integration tests covering compound predicate bucket pruning (single-field and composite bucket keys).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
paimon-core/src/main/java/org/apache/paimon/operation/BucketSelector.java New partition-aware bucket selector that derives candidate buckets from predicates.
paimon-core/src/main/java/org/apache/paimon/operation/BucketSelectConverter.java Refactored to return a BucketSelector (TriFilter) when eligible.
paimon-core/src/main/java/org/apache/paimon/table/source/snapshot/SnapshotReaderImpl.java Passes the full predicate to scans via withCompleteFilter.
paimon-core/src/main/java/org/apache/paimon/operation/FileStoreScan.java Adds withCompleteFilter and switches total-aware bucket filter to TriFilter.
paimon-core/src/main/java/org/apache/paimon/operation/AppendOnlyFileStoreScan.java Implements withCompleteFilter to install bucket-level pruning.
paimon-core/src/main/java/org/apache/paimon/operation/KeyValueFileStoreScan.java Implements withCompleteFilter to install bucket-level pruning.
paimon-core/src/main/java/org/apache/paimon/operation/AbstractFileStoreScan.java Threads partition value into bucket filtering during manifest entry filtering.
paimon-core/src/main/java/org/apache/paimon/manifest/BucketFilter.java Bucket filter now tests with (partition, bucket, totalBucket) via TriFilter.
paimon-core/src/main/java/org/apache/paimon/manifest/ManifestEntryCache.java Applies bucket filtering with partition context when scanning cached segments.
paimon-core/src/main/java/org/apache/paimon/AppendOnlyFileStore.java Constructs the new BucketSelectConverter instance.
paimon-core/src/main/java/org/apache/paimon/KeyValueFileStore.java Constructs the new BucketSelectConverter instance.
paimon-common/src/main/java/org/apache/paimon/utils/TriFilter.java New 3-arg filter functional interface used for bucket pruning.
paimon-common/src/main/java/org/apache/paimon/predicate/PartitionValuePredicateVisitor.java New visitor that evaluates partition-only leaf predicates against a concrete partition row.
paimon-common/src/main/java/org/apache/paimon/predicate/PredicateReplaceVisitor.java Uses PredicateBuilder.and/or to simplify rebuilt compound predicates.
paimon-core/src/test/java/org/apache/paimon/table/BucketFilterScanTest.java New integration test validating bucket pruning under compound predicates (single/composite keys).
paimon-core/src/test/java/org/apache/paimon/operation/BucketSelectorTest.java New unit tests for bucket selection behavior across predicate patterns and partitioned tables.
paimon-common/src/test/java/org/apache/paimon/predicate/PartitionValuePredicateVisitorTest.java New unit tests validating predicate rewriting for partition values.
paimon-core/src/test/java/org/apache/paimon/operation/BucketSelectConverterTest.java Removed (replaced by BucketSelectorTest).
paimon-core/src/test/java/org/apache/paimon/manifest/BucketFilterTest.java Removed (behavior now exercised via updated scan/bucket selector tests).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +159 to +168
int rowCount = 1;
for (List<Object> values : bucketValues) {
if (values == null) {
return Optional.empty();
}

rowCount *= values.size();
if (rowCount > MAX_VALUES) {
return Optional.empty();
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BucketSelector, rowCount *= values.size() uses int multiplication and can overflow into a negative value (e.g., two large IN-lists on a composite bucket key). That can bypass the rowCount > MAX_VALUES guard and then assembleRows may attempt to enumerate an enormous Cartesian product, risking hangs/OOM. Consider switching rowCount to long (or using checked multiplication) and short-circuiting as soon as the product exceeds MAX_VALUES (ideally also cap/dedup per-column value collection early to avoid holding huge literal lists).

Suggested change
int rowCount = 1;
for (List<Object> values : bucketValues) {
if (values == null) {
return Optional.empty();
}
rowCount *= values.size();
if (rowCount > MAX_VALUES) {
return Optional.empty();
}
long rowCount = 1L;
for (List<Object> values : bucketValues) {
if (values == null) {
return Optional.empty();
}
int size = values.size();
if (size <= 0) {
// No possible values for this column, no matching rows
return Optional.empty();
}
// Check for overflow and limit total number of combinations
if (rowCount > (long) MAX_VALUES / size) {
return Optional.empty();
}
rowCount *= size;

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +113
builder.option(BUCKET_KEY.key(), "b");
}
Schema schema = builder.build();

Identifier tableId = identifier("test_bucket_filter");
catalog.createTable(tableId, schema, false);
Table table = catalog.getTable(tableId);

// ---- write data: 5 partitions × 20 b-values = 100 rows ----
GenericRow[] rows = new GenericRow[100];
int idx = 0;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions hard-code specific bucket IDs (e.g., "3,1", "1,6"). That makes the test brittle to changes in the bucket hash implementation / BucketFunctionType defaults, even if bucket-level pruning is still correct. Consider computing expected bucket IDs using the same BucketFunction as production (and asserting on those), so the test validates pruning behavior without depending on a particular hash result.

Copilot uses AI. Check for mistakes.

package org.apache.paimon.utils;

/** Represents a filter (boolean-valued function) of three argument. */
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/grammar in the Javadoc: "filter ... of three argument" should be "... of three arguments".

Suggested change
/** Represents a filter (boolean-valued function) of three argument. */
/** Represents a filter (boolean-valued function) of three arguments. */

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +200
}
Schema schema = builder.build();

Identifier tableId = identifier("test_composite_bucket_filter");
catalog.createTable(tableId, schema, false);
Table table = catalog.getTable(tableId);

// ---- write data: 5 partitions × 20 b-values x 10 c-values = 1000 rows ----
GenericRow[] rows = new GenericRow[1000];
int idx = 0;
for (int a = 1; a <= 5; a++) {
for (int b = 1; b <= 20; b++) {
for (int c = 0; c < 10; c++) {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same brittleness here: the expected results are hard-coded bucket IDs for composite keys (e.g., "3,9", "3,0"...). To keep the test stable across bucket hash / BucketFunction changes, consider deriving expected bucket IDs via BucketFunction from the (b,c) literals instead of asserting specific numeric buckets.

Copilot uses AI. Check for mistakes.
@JingsongLi JingsongLi closed this Mar 20, 2026
@JingsongLi JingsongLi reopened this Mar 20, 2026
@jerry-024
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit 974b725 into apache:master Mar 20, 2026
19 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants